首页> 外文OA文献 >Finite mixture model of conditional dependencies modes to cluster categorical data
【2h】

Finite mixture model of conditional dependencies modes to cluster categorical data

机译:条件依赖模式的有限混合模型   分类数据

摘要

We propose a parsimonious extension of the classical latent class model tocluster categorical data by relaxing the class conditional independenceassumption. Under this new mixture model, named Conditional Modes Model,variables are grouped into conditionally independent blocks. The correspondingblock distribution is a parsimonious multinomial distribution where the fewfree parameters correspond to the most likely modality crossings, while theremaining probability mass is uniformly spread over the other modalitycrossings. Thus, the proposed model allows to bring out the intra-classdependency between variables and to summarize each class by a fewcharacteristic modality crossings. The model selection is performed via aMetropolis-within-Gibbs sampler to overcome the computational intractability ofthe block structure search. As this approach involves the computation of theintegrated complete-data likelihood, we propose a new method (exact for thecontinuous parameters and approximated for the discrete ones) which avoids thebiases of the \textsc{bic} criterion pointed out by our experiments. Finally,the parameters are only estimated for the best model via an \textsc{em}algorithm. The characteristics of the new model are illustrated on simulateddata and on two biological data sets. These results strengthen the idea thatthis simple model allows to reduce biases involved by the conditionalindependence assumption and gives meaningful parameters. Both applications wereperformed with the R package \texttt{CoModes}
机译:通过放松类的条件独立性假设,我们提出了经典潜在类模型的简约扩展,以聚类分类数据。在名为条件模式模型的新混合模型下,变量被分组为条件独立的块。相应的块分布是简约的多项式分布,其中很少的自由参数对应于最可能的模态交叉,而其余概率质量均匀地分布在其他模态交叉上。因此,所提出的模型允许发现变量之间的类内依赖性,并通过一些特征模态交叉总结每个类。该模型的选择是通过吉布斯大都市采样器进行的,以克服块结构搜索的计算难点。由于此方法涉及对完整完整数据似然性的计算,因此我们提出了一种新方法(对于连续参数精确,对于离散参数近似),避免了我们的实验指出的\ textsc {bic}准则的偏差。最后,仅通过\ textsc {em}算法为最佳模型估计参数。在模拟数据和两个生物学数据集上说明了新模型的特征。这些结果加强了这种简单模型允许减少条件独立性假设所涉及的偏差并给出有意义的参数的想法。这两个应用程序都使用R包\ texttt {CoModes}执行

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号